Skip to content

Coerce temporal operands in IN and BETWEEN when leastRestrictive finds no common type#5513

Merged
LantaoJin merged 2 commits into
opensearch-project:mainfrom
LantaoJin:fix/ppl-temporal-in-between
Jun 5, 2026
Merged

Coerce temporal operands in IN and BETWEEN when leastRestrictive finds no common type#5513
LantaoJin merged 2 commits into
opensearch-project:mainfrom
LantaoJin:fix/ppl-temporal-in-between

Conversation

@LantaoJin
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin commented Jun 4, 2026

Description

visitIn and visitBetween call leastRestrictive directly instead of going through the CoercionUtils path that comparison operators use. leastRestrictive returns no common type when temporal operands are represented differently — e.g. a standard Calcite TIMESTAMP field against EXPR_DATE UDT bounds/values (ts between date('...') and date('...'), ts in (DATE '...', ...)) — so both predicates rejected such queries with "expression types are incompatible" even though comparisons coerce the same mix.

Both now fall back to CoercionUtils.widenArguments (the comparison-operator path) when leastRestrictive yields null, scoped to all-temporal operands so genuinely incompatible mixes (e.g. age between '35' and 38.5) still raise SemanticCheckException. Enables the previously-undiscovered testDateBetween (was missing its @test annotation) and adds testDateIn plus a CoercionUtils unit test covering the plain-TIMESTAMP + EXPR_DATE mix.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…s no common type

visitIn and visitBetween call leastRestrictive directly instead of going through the
CoercionUtils path that comparison operators use. leastRestrictive returns no common
type when temporal operands are represented differently — e.g. a standard Calcite
TIMESTAMP field against EXPR_DATE UDT bounds/values (`ts between date('...') and
date('...')`, `ts in (DATE '...', ...)`) — so both predicates rejected such queries
with "expression types are incompatible" even though comparisons coerce the same mix.

Both now fall back to CoercionUtils.widenArguments (the comparison-operator path) when
leastRestrictive yields null, scoped to all-temporal operands so genuinely incompatible
mixes (e.g. `age between '35' and 38.5`) still raise SemanticCheckException. Enables the
previously-undiscovered testDateBetween (was missing its @test annotation) and adds
testDateIn plus a CoercionUtils unit test covering the plain-TIMESTAMP + EXPR_DATE mix.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit d0faf70)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The temporal IN rewrite (lines 239-244) converts field IN (v1, v2, ...) into field = v1 OR field = v2 OR .... If the value list is empty, this produces an empty OR, which may not behave identically to an empty IN. While empty IN lists are uncommon, the original makeIn path handled them; the rewrite should verify the list is non-empty or document that empty lists are unsupported for temporal fields.

if (TEMPORAL_TYPES.contains(fieldExprType)) {
  List<RexNode> equalities =
      valueList.stream()
          .map(value -> PPLFuncImpTable.INSTANCE.resolve(context.rexBuilder, "=", field, value))
          .toList();
  return context.relBuilder.or(equalities);
Possible Issue

pointsAsTimestampOr (line 833) calls sarg.negate() for complemented-points Sargs but does not verify that negation yields a points-only Sarg. If negation produces ranges instead of points, sargPointValue(range.lowerEndpoint()) at line 842 will fail or produce incorrect results because the range may not be a single point. The caller at line 795 already negates the entire expression, so negating the Sarg here appears redundant and potentially incorrect.

private static QueryExpression pointsAsTimestampOr(RexCall call, SwapResult pair) {
  RexLiteral literal = (RexLiteral) call.getOperands().get(1);
  Sarg<?> sarg = requireNonNull(literal.getValueAs(Sarg.class), "Sarg");
  Set<? extends Range<?>> ranges =
      (sarg.isComplementedPoints() ? sarg.negate() : sarg).rangeSet.asRanges();
  List<QueryExpression> queryExpressions =
      ranges.stream()
          .map(
              range ->
                  QueryExpression.create(pair.getKey())
                      .equals(sargPointValue(range.lowerEndpoint()), true))
          .toList();
  if (queryExpressions.size() == 1) {
    return queryExpressions.getFirst();
  }
  return CompoundQueryExpression.or(queryExpressions.toArray(new QueryExpression[0]));
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Code Suggestions ✨

Latest suggestions up to d0faf70
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Handle empty IN value list

Handle the edge case where valueList is empty. An empty IN clause would produce an
empty OR, which may cause unexpected behavior. Consider returning a constant false
expression when the list is empty.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [238-245]

 ExprType fieldExprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(field.getType());
 if (TEMPORAL_TYPES.contains(fieldExprType)) {
+  if (valueList.isEmpty()) {
+    return context.rexBuilder.makeLiteral(false);
+  }
   List<RexNode> equalities =
       valueList.stream()
           .map(value -> PPLFuncImpTable.INSTANCE.resolve(context.rexBuilder, "=", field, value))
           .toList();
   return context.relBuilder.or(equalities);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that an empty valueList would produce an empty OR expression, which could lead to unexpected behavior. Handling this edge case by returning a constant false expression is a valid improvement for robustness, though empty IN clauses are relatively uncommon in practice.

Medium
Validate non-empty operands list

Validate that operands is not empty before processing. An empty list would pass the
allTemporal check but may cause issues in widenArguments. Add a guard to return null
for empty operand lists.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [288-298]

 private static @Nullable List<RexNode> widenTemporalOperands(
     CalcitePlanContext context, List<RexNode> operands) {
+  if (operands.isEmpty()) {
+    return null;
+  }
   boolean allTemporal =
       operands.stream()
           .map(node -> OpenSearchTypeFactory.convertRelDataTypeToExprType(node.getType()))
           .allMatch(TEMPORAL_TYPES::contains);
   if (!allTemporal) {
     return null;
   }
   return CoercionUtils.widenArguments(context.rexBuilder, operands);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion identifies a potential edge case where an empty operands list could cause issues. While allMatch on an empty stream returns true (which would be handled correctly by returning null), adding an explicit guard improves code clarity and defensive programming practices.

Low
Handle empty range set case

Handle the case where ranges is empty. An empty range set would produce an empty OR
expression, which may not represent the correct query semantics. Return an
appropriate constant expression (e.g., false for empty IN, true for empty NOT IN).

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [833-849]

 private static QueryExpression pointsAsTimestampOr(RexCall call, SwapResult pair) {
   RexLiteral literal = (RexLiteral) call.getOperands().get(1);
   Sarg<?> sarg = requireNonNull(literal.getValueAs(Sarg.class), "Sarg");
   Set<? extends Range<?>> ranges =
       (sarg.isComplementedPoints() ? sarg.negate() : sarg).rangeSet.asRanges();
+  if (ranges.isEmpty()) {
+    return QueryExpression.alwaysFalse();
+  }
   List<QueryExpression> queryExpressions =
       ranges.stream()
           .map(
               range ->
                   QueryExpression.create(pair.getKey())
                       .equals(sargPointValue(range.lowerEndpoint()), true))
           .toList();
   if (queryExpressions.size() == 1) {
     return queryExpressions.getFirst();
   }
   return CompoundQueryExpression.or(queryExpressions.toArray(new QueryExpression[0]));
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion identifies that an empty ranges set would produce an empty OR expression. However, the suggested fix assumes QueryExpression.alwaysFalse() exists (which may not be the case), and the semantic correctness depends on whether empty ranges are valid in this context. The concern is valid but the solution needs verification.

Low

Previous suggestions

Suggestions up to commit b0a75e2
CategorySuggestion                                                                                                                                    Impact
General
Maintain consistent operand ordering

The order of operands matters when constructing the IN expression. The field should
be the first element, followed by the value list, to maintain consistency with the
original structure where field is tested against valueList. Currently, the field is
appended last and extracted from the end, which works but is less intuitive.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [244-251]

-List<RexNode> operands = new ArrayList<>(valueList);
+List<RexNode> operands = new ArrayList<>();
 operands.add(field);
+operands.addAll(valueList);
 List<RexNode> widened = widenTemporalOperands(context, operands);
 if (widened != null) {
-  RexNode widenedField = widened.get(widened.size() - 1);
-  List<RexNode> widenedValues = widened.subList(0, widened.size() - 1);
+  RexNode widenedField = widened.get(0);
+  List<RexNode> widenedValues = widened.subList(1, widened.size());
   return context.rexBuilder.makeIn(widenedField, widenedValues);
 }
Suggestion importance[1-10]: 4

__

Why: While the suggestion correctly identifies that operand ordering could be more intuitive, the current implementation is functionally correct and works as intended. The change would improve code readability slightly but has minimal impact on functionality or maintainability.

Low

…tead of string-collapsing them

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Persistent review updated to latest commit d0faf70

@LantaoJin LantaoJin merged commit cf14aba into opensearch-project:main Jun 5, 2026
40 checks passed
@LantaoJin LantaoJin deleted the fix/ppl-temporal-in-between branch June 5, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants